Data mining with a parallel rule induction system based on gene expression programming

نویسندگان

  • Wagner Rodrigo Weinert
  • Heitor Silvério Lopes
چکیده

A parallel rule induction system based on gene expression programming (GEP) is reported in this paper. The system was developed for data classification. The parallel processing environment was implemented on a cluster using a message-passing interface. A master-slave GEP was implemented according to the Michigan approach for representing a solution for a classification problem. A multiple master-slave system (islands) was implemented in order to observe the co-evolution effect. Experiments were done with ten datasets, and algorithms were systematically compared with C4.5. Results were analysed from the point of view of a multi-objective problem, taking into account both predictive accuracy and comprehensibility of induced rules. Overall results indicate that the proposed system achieves better predictive accuracy with shorter rules, when compared with C4.5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Blasting Cost in Limestone Mines Using Gene Expression Programming Model and Artificial Neural Networks

The use of blasting cost (BC) prediction to achieve optimal fragmentation is necessary in order to control the adverse consequences of blasting such as fly rock, ground vibration, and air blast in open-pit mines. In this research work, BC is predicted through collecting 146 blasting data from six limestone mines in Iran using the artificial neural networks (ANNs), gene expression programming (G...

متن کامل

A Novel Method for Selecting the Supplier Based on Association Rule Mining

One of important problems in supply chains management is supplier selection. In a company, there are massive data from various departments so that extracting knowledge from the company’s data is too complicated. Many researchers have solved this problem by some methods like fuzzy set theory, goal programming, multi objective programming, the liner programming, mixed integer programming, analyti...

متن کامل

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a ...

متن کامل

A Survey of Parallel Data Mining

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data ...

متن کامل

Genetic Programming Based Formulation to Predict Compressive Strength of High Strength Concrete

This study introduces, two models based on Gene Expression Programming (GEP) to predict compressive strength of high strength concrete (HSC). Composition of HSC was assumed simplified, as a mixture of six components (cement, silica fume, super-plastisizer, water, fine aggregate and coarse aggregate). The 28-day compressive strength value was considered the target of the prediction.  Data on 159...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011